この 命令セットアーキテクチャ(ISA) はソフトウェアとハードウェア間の基本的な契約として機能します。これは、プログラマーが見える状態とプロセッサが実行する特定の操作を定義しています。 Y86-64 ISA x86-64の教育用サブセットであり、複雑なCISC設計をより扱いやすいモデルに簡略化しつつ、レジスタ依存型の手続きリンクを維持しています。
1. プログラマーが見える状態
この状態には レジスタファイル(RF) 15個のレジスタを備え、 条件コード(CC) フロー制御に使用され、 プログラムカウンタ(PC)、および ステータスコード(Stat) 正常動作(AOK)、停止(HLT)、またはエラー(ADR/INS)を示します。
2. CISCとRISCの特徴
x86-64は伝統的なCISCですが、Y86-64は 固定長エンコーディング と厳密な ロード/ストアアーキテクチャにより、メモリへのアクセスは特定の移動命令(例: rmmovq rA, D(rB))のみで行われます。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
Modify the sum function (Figure 4.6) to implement absSum using a conditional jump. Which approach is most architecturally sound for Y86-64?
Using jge to skip a subq instruction that negates the value.
Using a call to a separate absolute value function.
Using a memory-to-memory comparison.
Changing the status code to INS if the value is negative.
✅ Correct!
In Y86-64, we test the value (andq %r10, %r10) and then use jge to jump past the subtraction logic if the value is positive.❌ Incorrect
Y86-64 does not support memory-to-memory comparisons or hardware-level status changes for logic control.QUESTION 2
When implementing absSum with conditional move (cmovXX), how do we handle the sign inversion?
Subtract the value from zero in a temporary register, then cmovl the negative result back.
Use the iaddq instruction to flip the bits.
Y86-64 performs sign inversion automatically during mrmovq.
Conditional moves cannot be used for arithmetic logic.
✅ Correct!
We compute -x (e.g., using subq) and then use cmovl to replace x with its negative if the original was less than zero.❌ Incorrect
cmovXX only moves data; it does not perform arithmetic during the move itself.QUESTION 3
What is the byte encoding for the sequence: irmovq $15, %rbx; rrmovq %rbx, %rcx? (Starting at 0x100)
0x100: 30 f3 0f 00 00 00 00 00 00 00; 0x10a: 20 31
0x100: 30 3f 0f 00; 0x104: 20 13
0x100: 60 31; 0x102: 30 f3
0x100: 70 00; 0x102: 20 31
✅ Correct!
irmovq is 10 bytes (30 F rB ValC) and rrmovq is 2 bytes (20 rA rB).❌ Incorrect
Recall that Y86-64 instructions are fixed-length for specific types; irmovq always takes 10 bytes including the 8-byte constant.QUESTION 4
Determine the HCL code for the control signal mem_write in a SEQ processor.
bool mem_write = icode in { IRMMOVQ, IPUSHQ, ICALL };
bool mem_write = icode in { IMRMOVQ, IPOPQ, IRET };
bool mem_write = (valE == valM);
bool mem_write = stat == AOK;
✅ Correct!
Only instructions that push to stack or store to memory (rmmovq, pushq, call) trigger a memory write.❌ Incorrect
IMRMOVQ and IPOPQ perform memory reads, not writes.QUESTION 5
In the PIPE implementation, when should the signal E_bubble be set?
On mispredicted branches or load-use hazards.
Every time the PC is updated.
Only when the processor hits a HALT instruction.
When the Register File is being read.
✅ Correct!
E_bubble clears the execute stage to handle branch mispredictions or to stall for a load-use hazard.❌ Incorrect
Bubbling is a specific hazard management technique, not a standard part of every cycle.Case Study: Architectural Optimization and Logic
Advanced Y86-64 Implementation Details
You are tasked with extending the Y86-64 design. Consider the introduction of the iaddq instruction and the performance limits of a pipelined system with $k$ stages and overhead $T_{overhead}$.
Q
1. [Writing Task] Rewrite the Y86-64 sum function of Figure 4.6 to make use of the iaddq instruction. (Output: ~14 lines).
Solution:
Model Solution: sum: irmovq $0, %rax # 1: sum = 0 andq %rsi, %rsi # 2: set CC jmp test # 3: start test loop: mrmovq (%rdi), %rdx # 4: get *start addq %rdx, %rax # 5: sum += *start iaddq $8, %rdi # 6: start++ (Optimization!) iaddq $-1, %rsi # 7: count-- (Optimization!) test: jg loop # 8: if count > 0, loop ret # 9: return (Note: This removes the need for registers %r8 and %r9 previously used to store constants 8 and 1.)
Model Solution: sum: irmovq $0, %rax # 1: sum = 0 andq %rsi, %rsi # 2: set CC jmp test # 3: start test loop: mrmovq (%rdi), %rdx # 4: get *start addq %rdx, %rax # 5: sum += *start iaddq $8, %rdi # 6: start++ (Optimization!) iaddq $-1, %rsi # 7: count-- (Optimization!) test: jg loop # 8: if count > 0, loop ret # 9: return (Note: This removes the need for registers %r8 and %r9 previously used to store constants 8 and 1.)
Q
2. Write HCL code for a circuit that selects the median of word inputs A, B, and C.
Solution:
word median = [ (A <= B && B <= C) || (C <= B && B <= A) : B; (B <= A && A <= C) || (C <= A && A <= B) : A; 1 : C; ];
word median = [ (A <= B && B <= C) || (C <= B && B <= A) : B; (B <= A && A <= C) || (C <= A && A <= B) : A; 1 : C; ];
Q
3. As the number of pipeline stages $k$ goes to infinity, what happens to the throughput?
Solution:
Throughput = 1 / (T/k + T_overhead). As k approaches infinity, the term T/k vanishes, and the throughput approaches a limit of 1 / T_overhead. This demonstrates that pipeline overhead eventually becomes the bottleneck for processor speed.
Throughput = 1 / (T/k + T_overhead). As k approaches infinity, the term T/k vanishes, and the throughput approaches a limit of 1 / T_overhead. This demonstrates that pipeline overhead eventually becomes the bottleneck for processor speed.